701 research outputs found

    Forecasting in Database Systems

    Get PDF
    Time series forecasting is a fundamental prerequisite for decision-making processes and crucial in a number of domains such as production planning and energy load balancing. In the past, forecasting was often performed by statistical experts in dedicated software environments outside of current database systems. However, forecasts are increasingly required by non-expert users or have to be computed fully automatically without any human intervention. Furthermore, we can observe an ever increasing data volume and the need for accurate and timely forecasts over large multi-dimensional data sets. As most data subject to analysis is stored in database management systems, a rising trend addresses the integration of forecasting inside a DBMS. Yet, many existing approaches follow a black-box style and try to keep changes to the database system as minimal as possible. While such approaches are more general and easier to realize, they miss significant opportunities for improved performance and usability. In this thesis, we introduce a novel approach that seamlessly integrates time series forecasting into a traditional database management system. In contrast to flash-back queries that allow a view on the data in the past, we have developed a Flash-Forward Database System (F2DB) that provides a view on the data in the future. It supports a new query type - a forecast query - that enables forecasting of time series data and is automatically and transparently processed by the core engine of an existing DBMS. We discuss necessary extensions to the parser, optimizer, and executor of a traditional DBMS. We furthermore introduce various optimization techniques for three different types of forecast queries: ad-hoc queries, recurring queries, and continuous queries. First, we ease the expensive model creation step of ad-hoc forecast queries by reducing the amount of processed data with traditional sampling techniques. Second, we decrease the runtime of recurring forecast queries by materializing models in a specialized index structure. However, a large number of time series as well as high model creation and maintenance costs require a careful selection of such models. Therefore, we propose a model configuration advisor that determines a set of forecast models for a given query workload and multi-dimensional data set. Finally, we extend forecast queries with continuous aspects allowing an application to register a query once at our system. As new time series values arrive, we send notifications to the application based on predefined time and accuracy constraints. All of our optimization approaches intend to increase the efficiency of forecast queries while ensuring high forecast accuracy

    Gene Amplification in Tumor Cells : Developed De Novo or Adopted from Stem Cells

    Get PDF
    Gene amplifications have been known for several decades as physiological processes in amphibian and flies, e.g., during eggshell development in Drosophila and as part of pathological processes in humans, specifically in tumors and drug-resistant cells. The long-held belief that a physiological gene amplification does not occur in humans was, however, fundamental questioned by findings that showed gene amplification in human stem cells. We hypothesis that the physiological and the pathological, i.e., tumor associated processes of gene amplification share at their beginning the same underlying mechanism. Re-replication was reported both in the context of tumor related genome instability and during restricted time windows in Drosophila development causing the known developmental gene amplification in Drosophila. There is also growing evidence that gene amplification and re-replication were present in human stem cells. It appears likely that stem cells utilize a re-replication mechanism that has been developed early in evolution as a powerful tool to increase gene copy numbers very efficiently. Here, we show that, several decades ago, there was already evidence of gene amplification in non-tumor mammalian cells, but that was not recognized at the time and interpreted accordingly. We give an overview on gene amplifications during normal mammalian development, the possible mechanism that enable gene amplification and hypothesize how tumors adopted this capability for gene amplification

    Transparent Forecasting Strategies in Database Management Systems

    Get PDF
    Whereas traditional data warehouse systems assume that data is complete or has been carefully preprocessed, increasingly more data is imprecise, incomplete, and inconsistent. This is especially true in the context of big data, where massive amount of data arrives continuously in real-time from vast data sources. Nevertheless, modern data analysis involves sophisticated statistical algorithm that go well beyond traditional BI and, additionally, is increasingly performed by non-expert users. Both trends require transparent data mining techniques that efficiently handle missing data and present a complete view of the database to the user. Time series forecasting estimates future, not yet available, data of a time series and represents one way of dealing with missing data. Moreover, it enables queries that retrieve a view of the database at any point in time - past, present, and future. This article presents an overview of forecasting techniques in database management systems. After discussing possible application areas for time series forecasting, we give a short mathematical background of the main forecasting concepts. We then outline various general strategies of integrating time series forecasting inside a database and discuss some individual techniques from the database community. We conclude this article by introducing a novel forecasting-enabled database management architecture that natively and transparently integrates forecast models

    Sample-Based Forecasting Exploiting Hierarchical Time Series

    Get PDF
    Time series forecasting is challenging as sophisticated forecast models are computationally expensive to build. Recent research has addressed the integration of forecasting inside a DBMS. One main benefit is that models can be created once and then repeatedly used to answer forecast queries. Often forecast queries are submitted on higher aggregation levels, e. g., forecasts of sales over all locations. To answer such a forecast query, we have two possibilities. First, we can aggregate all base time series (sales in Austria, sales in Belgium...) and create only one model for the aggregate time series. Second, we can create models for all base time series and aggregate the base forecast values. The second possibility might lead to a higher accuracy but it is usually too expensive due to a high number of base time series. However, we actually do not need all base models to achieve a high accuracy, a sample of base models is enough. With this approach, we still achieve a better accuracy than an aggregate model, very similar to using all models, but we need less models to create and maintain in the database. We further improve this approach if new actual values of the base time series arrive at different points in time. With each new actual value we can refine the aggregate forecast and eventually converge towards the real actual value. Our experimental evaluation using several real-world data sets, shows a high accuracy of our approaches and a fast convergence towards the optimal value with increasing sample sizes and increasing number of actual values respectively

    F2DB: The Flash-Forward Database System

    Get PDF
    Forecasts are important to decision-making and risk assessment in many domains. Since current database systems do not provide integrated support for forecasting, it is usually done outside the database system by specially trained experts using forecast models. However, integrating model-based forecasting as a first-class citizen inside a DBMS speeds up the forecasting process by avoiding exporting the data and by applying database-related optimizations like reusing created forecast models. It especially allows subsequent processing of forecast results inside the database. In this demo, we present our prototype F2DB based on PostgreSQL, which allows for transparent processing of forecast queries. Our system automatically takes care of model maintenance when the underlying dataset changes. In addition, we offer optimizations to save maintenance costs and increase accuracy by using derivation schemes for multidimensional data. Our approach reduces the required expert knowledge by enabling arbitrary users to apply forecasting in a declarative way

    Der Ort, die IdentitÀt, die Architektur

    Get PDF

    Futterwert von Mais-Bohnen-Silagen: Stangen- und Feuerbohnen im Vergleich

    Get PDF
    Maize for silage is a major forage source for dairy cows with its high energy density, but the protein concentration is low. Therefore intercropping systems with maize and climbing beans, which have the potential to improve the protein- and energy supply from regional grown roughage, were investigated since 2014 in field experiments. Four cultivars of Phaseolus vulgaris and two cultivars of P. coccineus were evaluated for their potential in intercropping with maize and their crude nutrient contents were evaluated in comparison with maize. Furthermore defined mixtures of two cultivars of P. vulgaris and two cultivars of P. coccineus with maize were produced to assess their digestibility in sheep. All intercrop silages (IS) had higher crude protein values than maize. The in vivo organic matter digestibility were higher for IS when compared to maize. The metabolisable energy for IS were also higher than for maize silage

    Indexing forecast models for matching and maintenance

    Get PDF
    Forecasts are important to decision-making and risk assessment in many domains. There has been recent interest in integrating forecast queries inside a DBMS. Answering a forecast query requires the creation of forecast models. Creating a forecast model is an expensive process and may require several scans over the base data as well as expensive operations to estimate model parameters. However, if forecast queries are issued repeatedly, answer times can be reduced significantly if forecast models are reused. Due to the possibly high number of forecast queries, existing models need to be found quickly. Therefore, we propose a model index that efficiently stores forecast models and allows for the efficient reuse of existing ones. Our experiments illustrate that the model index shows a negligible overhead for update transactions, but it yields significant improvements during query execution

    What constitutes a local public Ssphere?: Building a monitoring framework for comparative analysis

    Full text link
    Despite the research tradition in analyzing public communication, local public spheres have been rather neglected by communication science, although they are crucial for social cohesion and democracy. Existing empirical studies about local public spheres are mostly case studies which implicitly assume that cities are alike. Based on a participatory-liberal understanding of democracy, we develop a theoretical framework, from which we derive a monitor covering structural, social, and spatial aspects of local communication to empirically compare local public spheres along four dimensions: (1) information, (2) participation, (3) inclusion, and (4) diversity. In a pilot study, we then apply our monitor to four German cities that are comparable in size and regional function (‘regiopolises’). The monitoring framework is built on local statistical data, some of which was provided by the cities, while some came from our own research. We show that the social structures and the normative assessment of the quality of local public spheres can vary among similar cities and between the four dimensions. We hope the innovative monitor prototype enables scholars and local actors to compare local public spheres across spaces, places, and time, and to investigate the impact of social change and digitalization on local public spheres

    Forecasting the data cube

    Get PDF
    Forecasting time series data is crucial in a number of domains such as supply chain management and display advertisement. In these areas, the time series data to forecast is typically organized along multiple dimensions leading to a high number of time series that need to be forecasted. Most current approaches focus only on selection and optimizing a forecast model for a single time series. In this paper, we explore how we can utilize time series at different dimensions to increase forecast accuracy and, optionally, reduce model maintenance overhead. Solving this problem is challenging due to the large space of possibilities and possible high model creation costs. We propose a model configuration advisor that automatically determines the best set of models, a model configuration, for a given multi-dimensional data set. Our approach is based on a general process that iteratively examines more and more models and simultaneously controls the search space depending on the data set, model type and available hardware. The final model configuration is integrated into F2DB, an extension of PostgreSQL, that processes forecast queries and maintains the configuration as new data arrives. We comprehensively evaluated our approach on real and synthetic data sets. The evaluation shows that our approach significantly increases forecast query accuracy while ensuring low model costs
    • 

    corecore